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ARTIST:  A  Silicon  Assembler  for  Mesh  Ajrrays 


ABSTRACT 


This  paper  describes  a  VLSI  layout  assembler,  ARTIST,  under  development  at 
Penn  State.  ARTIST  performs  transistor  placement  and  interconnection  within  a 
module.  Novel  ideas  used  in  the  design  of  the  assembler  are  described.  A  modular 
software  design  is  used  so  that  we  can  easily  try  different  approximation  algorithms  for 
transistor  placement.  A  comparison  between  simulated  annealing  and  a  totally  random 
approach  is  presented.  Surprisingly,  the  random  approach  is  better  for  realistic  running 
times.  Finally,  a  hybrid  approximation  algorithm  for  transistor  placement  is  described 
and  is  shown  to  be  better  than  either  of  the  other  two  algorithms. 
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1NTRODVCTIOS 

ARTIST  is  a  tool  under  development  at  Penn  State  which  generates  a  layout,  in 
CMOS  mesh  array  form  [Be],  for  a  module  from  its  formal  description.  The  name  AR¬ 
TIST,  rather  than  the  sometimes  overused  name  "silicon  compiler,”  was  chosen  due  to 
the  tightness  between  the  formal  description  and  the  layout  generated  by  ARTIST  from 
that  description.  ARTIST  is  a  key  part  of  a  CAD  system  under  development  at  Penn 
State  [Oil].  Other  tools  in  this  CAD  system  include:  LOGICIAN,  a  tool  for  module 
generation  which  performs  multi-level  logic  reduction  [BOI];  COMPOSER,  a  tool  for 
module  placement  within  the  target  architecture;  SIMULATE,  a  simulation  tool  [012]; 
and  V,  a  layout  verification  tool  [RI]. 

ARTIST  owes  its  existence  to  both  pragmatic  and  academic  reasons.  The  academic 
reason  was  to  provide  a  test  bed  so  that  different  approximation  algorithms  for 
efficiently  finding  near  optimal  layouts  could  be  tried.  The  pragmatic  reason  was  the 
growing  need  to  be  able  to  quickly  generate  layouts  for  some  of  the  VLSI  architecture 
projects  ongoing  at  Penn  State. 

The  part  of  our  design  system  surrounding  ARTIST  is  shown  in  Figure  1.  This  pa¬ 
per  first  introduces  the  language  in  which  the  formal  description  of  the  module  can  be 
specified.  ARTIST  actually  accepts  only  a  restricted  version  of  the  full  language.  Thus, 
a  Parser  is  used  to  transform  a  description  which  uses  the  full  language  into  a  descrip¬ 
tion  using  only  the  language  subset.  LOGICIAN,  the  multi-level  logic  minimization  tool 
which  feeds  ARTIST  as  shown  in  Figure  1,  outputs  the  module  description  using  this 
language  subset.  The  layout  program,  ARTIST,  which  performs  transistor  placement 
and  interconnect  is  then  discussed.  Finally,  several  layout  optimization  algorithms  and 
their  performance  are  presented. 


Formal 

LEd /Magic 

LOGICIAN 

Description 

ARTIST 

Layout 

of  the 

of  the 

Module 

Module 

Figure  1.  Portion  of  Design  System  Surrounding  ARTIST 


THE  LANGUAGE 


We  wanted  the  form  used  to  describe  the  layout  to  be  generated  by  ARTIST  to 
satisfy  the  following. 

1)  The  form  should  allow  a  close  relationship  between  a  formal  description  of 
a  module  and  the  layout  generated  from  that  description. 

2)  The  form  would  not  in  itself  restrict  the  range  of  layouts  which  can  be 
generated. 

3)  Descriptions  expressed  in  other  forms  (i.e.,  net  list)  could  be  easily 
translated  into  descriptions  expressed  in  our  form. 

To  achieve  these  goals,  a  procedural  based  language  was  developed.  We  will  briefly 
describe  the  language  in  this  paper.  A  full  description  can  be  found  in  |Ow). 

The  description  of  a  module  consists  of  a  set  of  procedures.  Each  of  the  procedures 
consists  of  a  set  of  statements.  The  primary  statement  of  the  language  is  the  assign¬ 
ment  statement.  Like  the  assignment  statement  of  other  languages,  the  assignment 
statement  of  our  language  is  used  to  supply  information  to  determine  the  value  that  a 
given  variable  has  at  any  given  time.  However,  the  assignment  statement  of  our 
language  is  quite  different  from  the  assignment  statement  of  other  languages  in  several 
important  ways.  The  syntax  of  the  assignment  statement  of  our  language  which  is 
given  by 

<  t  >  a  =  6  ; 

where  e  is  a  boolean  expression  and  a  and  b  are  either  variables  or  boolean  constants 
(I  for  true  and  0  for  false)  is  different.  A  boolean  expression  is  an  expression  formed  us¬ 
ing  only  8  (and),  |  (or),  and  !  (negation/not).  The  expression  t  is  the  assertion  part 
and  the  equality  a  =  b  is  the  equality  part  of  the  statement.  An  assignment  state¬ 
ment  is  interpreted  in  the  following  way:  if  t  is  true,  then  a  and  b  have  the  same 
value.  An  assignment  statement  for  which  the  assertion  part  is  true  is  said  to  be  in  an 
active  state.  Also,  semantics  of  the  assignment  of  our  language  which  is  given  by  the 
following  is  different. 

1) 


2) 


The  symbol  =  in  the  equality  part  implies  equality,  not  normal  assign¬ 
ment;  that  is,  either  variable  can  be  changed  so  as  to  make  the  equality 
true.  The  symbol  =  was  chosen  because  of  the  lack  of  the  symbol  =  in 
most  text  entry  character  sets. 

The  order  of  the  statements  within  a  procedure  has  no  bearing  on  the 
manner  in  which  they  are  interpreted  including  the  order  in  which  they 
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are  executed. 

The  statements  are  not  executed  as  the  term  normally  applies.  If  the 
assertion  part  of  the  statement  is  true,  the  equality  part  is  made  and  kept 
true  by  possibly  changing  the  value  of  either  variable  as  long  as  the  asser¬ 
tion  part  is  true.  If  this  condition  cannot  be  obtained,  an  error  is  as¬ 
sumed  . 

Unless  it  is  necessary  to  change  the  value  of  a  variable  to  either  directly  or 
indirectly  satisfy  the  equality  part  of  an  active  assignment  statement,  the 
value  of  a  variable  is  assumed  to  remain  unchanged. 


There  are  two  primitive  forms  of  the  assignment  statement  as  shown  below  . 


<  <?  >  «  =  d ; 


<  !$  >e  —  d; 


N  channel 


P  channel 


where  t ,  i ,  and  g  are  variables  and  !  is  boolean  negation  (not).  Note  that  these  two 
primitive  forms  correspond  to  a  N  channel  and  a  P  channel  field  effect  transistor. 


Assignment  statements  can  be  manipulated  in  the  following  ways.  The  statements 


<  e ,  >  a  —  b  ;  <  e2  >  a  =  6  ; 


infer  the  statement 


<  c,  |  e2  >  a  =  b  ; 


and  vice-versa.  Also,  if  b  is  not  otherwise  used,  the  statements 


<  c ,  >  a  =  6  ;  <  e  2  >  b  =  e  ; 


infer  the  statement 


<c,ff  e2  >  a  =  c  ; 


k 


and  vice-versa. 


Besides  the  assignment  statement,  our  language  also  supports  a  procedure  call 
statement.  The  effect  of  using  a  procedure  call  is  that  normally  associated  with  a  textu¬ 
al  macro  call  or  an  Algol  procedure  call.  Thus,  the  effect  of  a  procedure  call  is  that  ob¬ 
tained  by  replacing  the  procedure  call  by  the  body  of  the  procedure  where  formal 
parameters  have  been  renamed  to  their  corresponding  actual  parameters  and  other  vari¬ 
ables  renamed  as  necessary. 

The  language  has  also  been  extended  to  support  loop  statements  (for  which  the 
range  is  known),  conditional  statements  (for  which  the  condition  can  be  computed  at 
compile  time),  and  array  variables  (bit  vectors).  To  more  fully  illustrate  the  language, 
the  complete  formal  description  of  a  typical  module  of  modest  size,  the  mcell  (Oil,  IO], 
which  performs  a  restricted  base  4  signed  digit  multiplication,  is  given  in  Figure  2. 


mcell  (m  0,  m  1,  m2,  a-  0,  x  1,  x  2,  r  0,  r  1,  r  2,  q  0,  q  1) 

{ 

<  zOg  r  0  >  tO  —  0; 

<  !i0  |  !r0  >  t  0  =  1; 

<10#  j  0  >  m  0  =  0; 

<  !t  0  &  \q  1  >  m  0  »  1; 

<  rl  6  x  0  >  f  1  =  0; 

<  !r  1  |  !i0  >  11  =  1; 

<  r  0  #  x  1  >  £2  =  0; 

<  !r0  |  !il  >  (2=1; 

<£1>£3  =  0; 

<  ttl  >  *3  =  1; 

<  1 2  >  M  =  0; 

<  !£  2  >  £  4  =  1; 

<  (t  1  |  t4)  &  (t  2  |  t  3)  &  q  0  >  m  1  =  0; 

<  ((!£  1  &  !M)  |  (!£  2  ^  It  3))  &  \q  1  >  m  1  =  1; 

<  (i0  |  x  1)  &  r  2  >  t5  =  0; 

<  (liOfif  !il)  |  !r  2  >  £5  =  1; 

<  (r0  |  rl)#  z  2  >  £  6  =  0; 

<  (!r  0  &  !r  1)  |  !x  2  >  t6  =  1; 

<  (t  5  |  z  2)  &  (i  6  |  r  2)  &  q  0  >  m  2  =  0; 

<  ((!t  5  &  !z  2)  |  (!t  6  &  !r  2))  &  \q  1  >  m  2  =  1; 

} 


Figure  2.  Description  for  mcell 


THE  PARSER 


To  reduce  the  overall  complexity  of  ARTIST  and  for  reasons  which  will  become 
clear  later,  it  was  decided  that  ARTIST  itself  would  only  accept  a  semantic  and  syntac¬ 
tic  subset  of  our  language.  Briefly,  ARTIST  itself  does  not  support  procedure  calls  (or 
any  other  of  the  extended  features  of  the  language).  Furthermore,  for  each  defined  vari¬ 
able,  a  ,  ARTIST  requires  exactly  two  consecutive  statements  of  the  form 

<  et  >  a  =0; 

<  t%  >  a  =  1; 

Furthermore,  variable  a  may  otherwise  be  used  only  in  the  assertion  part  of  an  assign¬ 
ment  statement.  Variables  which  are  not  defined  variables,  referred  to  as  used  vari¬ 
ables,  may  only  appear  in  the  assertion  part  of  an  assignment  statement.  Boolean  nega¬ 
tion  may  not  be  used  in  expressions  ed  and  e„  except  that  all  the  variables  of  eu  must 
be  negated.  Note  that  the  description  of  mcell  given  in  Figure  2  conforms  to  these  res¬ 
trictions. 

For  the  most  part,  having  ARTIST  accept  only  a  subset  of  the  language  is  not 
much  of  a  problem  for  layout  descriptions  generated  by  our  other  CAD  tools  (e.g.,  LOG¬ 
ICIAN  [BOI]).  It  is  fairly  easy  to  write  these  tools  so  that  they  produce  a  description 
using  only  the  subset.  However,  for  human  generated  input  or  for  input  translated  from 
a  graphic  form  (i.e.,  schematic  capture),  very  often  the  conditions  imposed  by  the  subset 
seem  unnatural  and  unduly  restrictive.  To  get  around  this  problem,  a  standard  full 
language  to  sub  language  translator,  Parser,  was  developed. 

As  illustrated  in  the  previous  section,  a  statement  can  be  manipulated  using  a  fairly 
simple  set  of  rules.  For  example,  it  is  fairly  obvious  that  the  set  of  statements 


<  a  >  t  =  0; 

t  c  c  c 

<6  >  c  =  t ; 

1  J  1  J 

a  b  a  b 

II  II 

A  A 

O  £ 

V  V 

Mil 
0  111 

can  be  transformed  into 

c  c 

L  a"~b 

1  1 — 1 

<a#6>c=0; 

?  1 

<  la  |  !6  >  c  =  1; 

0 

assuming  that  variable  t  is  otherwise  unused.  Note  the  transformed  set  of  statements  is 


O 

■"-] 
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acceptable  to  ARTIST  while  the  original  set  is  not.  At  first,  it  would  seem  that  the 
tasks  of  Parser  can  be  trivially  performed.  However,  this  is  not  always  the  case. 

The  first  nontrivial  situation  encountered  by  Parser  is  caused  by  the  use  of  shared 
terms  caused  by  bidirectional  paths.  For  example,  consider  the  following  set  of  state¬ 
ments. 

<  a  >  v  =  0; 

<6  >  w  =  0; 

<  e  >  v  =  w; 

<  d  >  x  =  v ; 

<  t  >  x  —  w ; 

Translating  these  statements  to  one  of  the  form 

<  /  >  x  =  0; 

cannot  be  performed  by  a  simple  pairwise  serial  or  parallel  combining  of  the  original  set 
of  statements.  Parser  handles  this  case  by  first  replicating  the  bidirectional  path  (e.g., 
as  implied  by  r  =  w  )  to  produce  the  following  set  of  statements. 

<  a  >  v  =  0; 

<6  >ii—0; 

<  c  >  v  =  « ; 

<  d  >  x  =  v ; 

<6  >  w  —  0; 

<  a  >  y  =  0; 

<  c  >  w  =  y ; 

<  e  >  x  =  w ; 

These  statements  can  now  be  translated  using  normal  series  and  parallel  combinations 
to  produce 

<  d  &  (a  \  c  &  b)  \  e  &  (b  \  c  &  a)  >  : r  =  0; 

assuming  that  variables  v ,  «  ,  tu ,  and  y  are  otherwise  unused.  An  important  implica¬ 
tion  of  Parser’s  elimination  of  shared  terms  is  that  descriptions  which  contain  ”pass” 
transistors  are  converted  to  logically  equivalent  descriptions  which  do  not. 
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The  second  nontrivial  situation  encountered  by  the  Parser  is  caused  by  the  use  of 
boolean  negation.  The  original  language  allows  the  free  use  of  negation  in  the  assertion. 
However,  the  subset  accepted  by  ARTIST  greatly  restricts  the.  use  of  negation.  Parser 
handles  this  case  by  first  pushing  the  use  of  negation  to  the  literals  of  the  expression. 
For  example, 


<  !(n  &  (!6  |  !c  ))>  x  =  0; 
is  translated  into 

<!e  |fcfifc>ar=0; 

Illegal  variable  negation  is  then  corrected  by  introducing  a  dummy  variable.  Thus,  the 
statement 

<!c  |  b  &  c  >  x  =  0; 

is  translated  into 

<  d  |  fc  #  c  >  x  —  0; 

<  c  >  d  —  0; 

<  \a>  d  =1; 

The  effect  of  this  transformation  is  that  Parser  introduces  an  inverter  to  generate  the 
needed  form  (inverted/noninverted)  of  a  variable.  Parser  does  not  attempt  to  optimize 
translation  of  illegal  negations.  Optimization  is  provided  by  other  tools  (i.e.,  LOGI¬ 
CIAN). 

MESH  ARRAYS 


Mesh  arrays  were  originally  conceived  as  a  structured  aid  to  the  hand  layout  of 
multi-level  CMOS  logic  (Be).  It  is  surprisingly  simple  and  efficient  to  create  the  layout 
for  a  mesh  array  by  modifying  a  single  general  template.  Mesh  arrays  are,  at  present, 
implemented  using  a  two  level  metal  CMOS  process  as  supported  by  MOSIS.  Physical¬ 
ly,  horizontal  first  level  metal  segments  are  used  for  transistor  interconnections,  power, 
and  ground.  Hence,  logically,  row  segments  of  the  mesh  are  allocated  to  variables. 
Physically,  consecutive  vertical  diffusion  segments  are  used  to  form  the  pullup  (pull  the 
gate’s  output  toward  Vdd)  and  pulldown  (pull  the  gate’s  output  toward  Gnd)  part  of 
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each  gate.  Hence,  logically,  columns  of  the  mesh  are  allocated  to  statements.  Vertical 
second  level  metal  segments  are  used  to  connect  the  pullup  and  pulldown  part  of  each 
gate  and  to  distribute  module  inputs  and  outputs. 

The  physical  structure  of  mesh  arrays  can  be  described  in  more  detail  through  a 
constructive  description.  The  mesh  array  for  a  circuit  consisting  of  a  single  N  channel 
transistor  is  illustrated  in  Figure  3. 


<  g  >  S  —  d 

Figure  3.  Single  Transistor  Mesh  Array 

A  horizontal  first  level  metal  segment,  which  is  carrying  the  signal  associated  with  vari¬ 
able  g  ,  is  connected  to  the  gate  (via  a  polycontact)  of  the  transistor. 

The  mesh  array  for  a  circuit  consisting  of  a  parallel  connection  of  two  subcircuits  is 
illustrated  in  Figure  4. 


Figure  4.  Parallel  Connected  Subcircuits 


Since  the  two  subcircuits  share  the  same  well,  they  both  must  consist  of  the  intercon¬ 
nection  of  only  N  channel  (P  well)  transistors  or  of  only  P  channel  (N  well)  transistors. 
Note  that,  because  of  conflicts  in  the  allocation  of  row  segments,  both  subcircuits  may 
have  to  be  stretched  in  the  vertical  direction. 

Likewise,  the  mesh  array  for  a  circuit  consisting  of  a  series  connection  of  two  sub- 
circuits  is  illustrated  in  Figure  5. 


_  _ 


'  <  e,  >  a 

z^z 


=  t; 


<  t  o  >  t  =  b ; 


<  e  j  &  e  2  >  a  =  b  ; 


Figure  5.  Series  Connected  Subcircuits 


Again  since  the  two  subcircuits  share  the  same  well,  they  both  must  consists  of  the  in- 


terconnection  of  only  N  channel  (P  well)  transistors  or  of  only  P  channel  (N  well) 
transistors. 

Finally,  a  gate  consists  of  the  interconnection  of  two  subcircuits  is  illustrated  in 
Figure  6. 


<  eu  >  a  =  1; 

<  cd  >  a  =  0; 


Figure  6.  Single  Gate 


The  pullup  circuit  must  consist  of  only  P  channel  (N  well)  transistors  and  the  pulldown 
circuit  must  consist  of  only  N  channel  (P  well)  transistors.  To  complete  our  description, 
the  mesh  array  for  a  two  input  nor  gate  is  illustrated  in  Figure  7. 
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Figure  7.  Two  Input  Nor  Gate 

Now  that  the  physical  structure  of  mesh  arrays  has  been  described,  the  reasons 
behind  the  restricted  language  accepted  by  ARTIST  should  be  apprent.  The  two  state¬ 
ments  associated  with  a  defined  variable  describe  the  pullup  and  pulldown  part  of  the 
CMOS  gate  which  generates  that  variable.  The  pullup  part  of  the  gate  specified  by  t% 
contains  only  P  channel  transistors  and  the  pulldown  part  of  the  gate  specified  by  e4 
contains  only  N  channel  transistors.  Furthermore,  only  literals  of  e,  and  t4  can  be 
negated.  However,  a  variable  which  is  negated,  which  would  imply  a  P  channel  transis¬ 
tor,  cannot  appear  in  e4  .  Also,  a  variable  which  is  not  negated,  which  would  imply  a  N 
channel  transistor,  cannot  appear  in  t%  . 

Mesh  arrays  do  not  structurely  support  pass  transistors.  Hence,  bidirectional  paths 
are  not  permitted.  Defined  variables  represent  the  outputs  of  the  gates  of  the  module 
and  used  variables  represent  inputs  supplied  to  the  module.  Figure  8  illustrates  a  hand 
generated  mesh  array  layout  for  the  formal  description  of  the  mcell  module  given  in  Fig¬ 
ure  2. 
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Figure  8.  Hand  Generated  mcell  Layout 

This  layout  is  seventeen  columns  by  twenty  four  rows  and  took  about  twenty  four  man 
hours  to  create. 


ARTIST 

Briefly,  ARTIST,  by  reading  the  description  of  the  module,  creates  an  initial  inter¬ 
nal  representation  of  the  layout  to  be  generated.  It  then  then  manipulates  the  internal 
representation  trying  to  reduce  the  layout’s  size.  Finally,  ARTIST  generates  the  actual 
layout  in  a  format  compatible  to  either  LEd  or  Magic  from  the  final  internal  representa¬ 
tion.  ARTIST  manipulates  the  internal  representation  by  performing  some  number  of 
trials.  For  each  trial,  ARTIST  generates  a  new  configuration  of  the  internal  representa¬ 
tion,  determines  the  layout  size  of  the  new  configuration,  and  then  possibly  replaces  the 
old  configuration  with  the  new  configuration.  Different  versions  of  ARTIST  can  be 
characterized  by  how  they  generate  successive  new  configurations  and  how  they  decide 
to  replace  an  old  configuration  by  a  new  configuration. 

One  of  the  most  important  decisions  made  during  the  dc=icn  of  ARTIST  is  con¬ 
cerned  with  how  a  mesh  array  is  internally  represented.  It  is  gem  >  illy  more  efficient  to 
manipulate  a  high  level  internal  representation  (symbolic)  than  <  low’  level  internal 
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representation  (paint  rectangles).  Hence,  using  a  high  level  internal  representation  usu¬ 
ally  allows  more  configurations  to  be  evaluated  in  a  given  period  of  time.  However, 
since  a  low  level  description  is  usually  tighter,  it  is  less  likely  to  hide  possible  optimiza¬ 
tions  and  is  at  the  smae  time  easier  to  use  to  determine  the  layout  size  of  a 
configuration.  One  advantage  of  our  overall  approach  is  that  the  description  itself 
(internally  the  description’s  parse  tree)  can  be  used  almost  without  any  information  loss 
to  almost  directly  represent  the  mesh  array  to  be  generated.  This  high  level  internal 
representation  is  particularly  easy  to  manipulate.  However,  we  can  still  efficiently  deter¬ 
mine  the  layout  size  for  a  given  configuration. 

One  of  the  principal  functions  performed  by  ARTIST  is  the  allocation  of  rows  and 
columns  of  a  mesh  array  to  the  subcircuits  of  the  module.  ARTIST  performs  this  allo¬ 
cation  using  a  module  description  in  the  following  way.  At  the  gate  level,  a  mesh  array 
can  be  view  as  illustrated  in  Figure  9. 


Figure  9.  Block  Diagram  of  mcell  Layout 

The  pullup  and  pulldown  part  of  each  gate  are  allocated,  respectively,  consecutive 
columns  in  upper  (N  well)  and  lower  (P  well)  areas  of  the  array.  Hence,  the  pullup 
(pulldown)  parts  of  any  two  gates  cannot  overlap.  Because  of  the  second  level  metal 
wire  connecting  the  two  together,  the  areas  allocated  to  the  pullup  and  pulldown  part  of 
each  gate  must  have  at  least  one  common  column.  These  characteristics  lead  to  the  fol¬ 
lowing  observation.  The  order  of  the  statement  pairs  associated  with  each  of  the 
defined  variables  can  be  used  to  specify  the  order  of  the  gates  and,  consequently,  the 
columns  to  be  allocated  to  each  pullup  and  pulldown. 


The  order  of  the  operands  for  each  operation  supplies  most  of  the  remaining  infor¬ 
mation  needed  by  ARTIST.  In  the  case  of  &  ,  the  row  segments  allocated  to  the  left 
operand  must  precede  the  row  segments  allocated  to  the  the  right  operand.  In  the  case 
of  |  ,  the  column  segments  allocated  to  the  left  operand  must  precede  the  column  seg¬ 
ments  allocated  to  the  right  operand.  The  final  information  needed  by  ARTIST  is  how 
row  segments  are  to  be  allocated.  This  information  is  not  implied  by  the  module 
description,  layout  to  be  generated. 

For  each  well,  ARTIST  generates  a  layout  row  by  row.  For  each  row,  row  seg¬ 
ments  allocation  and  layout  generation  is  performed  as  follows  (although  we  have  omit- 
ed  many  details,  the  following  overview  is  conceptually  correct.) 

1)  Initially  each  of  the  expressions  associated  with  the  pullup  part,  if  the  N 
well  area  is  being  generated,  or  the  pulldown  part,  if  the  P  well  area  is  be¬ 
ing  generated,  of  each  gate  is  activated. 

2)  If  the  operation  |  is  activated,  then  both  of  its  operands  are  activated. 

3)  If  the  operation  &  is  activated,  then  its  left  operand  is  activated. 

4)  If  a  variable  is  assigned  to  a  segment  row,  it  is  deactivated. 

5)  If  the  left  operand  of  the  operation  &  is  deactivated,  the  the  right 

operand  is  activated. 

6)  If  both  operands  of  an  operation  is  deactivated,  then  the  operation  is  deac¬ 
tivated. 

7)  Generation  of  the  layout  for  that  well  terminates  when  all  operations  and 
operands  have  been  deactivated. 

Before  each  row  is  allocated,  ARTIST  scans  the  row  for  variables  which  are  active. 
Based  on  this  information,  ARTIST  allocates  row  segments  to  some  of  the  active  vari¬ 
ables,  generates  the  layout  for  that  row,  and  then  deactivates  the  variables  which  were 
assigned.  If  an  active  variable  cannot  be  assign  to  a  row  segment,  the  circuit  associated 
with  that  variable  is  stretched  into  the  next  row. 

While  many  row  allocation  algorithms  have  been  tried  (e.g.,  First  Fit,  Most  Fit, 
FIFO),  one  conclusion  seems  clear.  Any  reasonable  row  allocation  algorithm  appears  to 
work  as  good  as  any  other.  This  seems  to  be  the  result  of  having  to  deal  with  very  few 
active  variables  (as  compared  to  the  total  number  of  variables)  at  each  row. 

ARTIST  is  being  used  on  a  regular  basis  at  Penn  State.  Figure  10  illustrates  the 
mesh  array  layout  generated  by  ARTIST  for  the  formal  description  of  the  mcell  module 
given  in  Figure  2. 
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Figure  10.  ARTIST  Generated  mcell  Layout 

This  layout  is  seventeen  columns  by  eighteen  rows  and  took  about  one  minute  to  create. 
Note  that  ARTIST  created  a  smaller  layout  in  far  less  time  than  the  hand  created  lay¬ 
out.  In  creating  the  layout  given  in  Figure  10,  ARTIST  performed  about  forty  trials  per 
second  on  a  68020  based  workstation,  a  VALID  Logic  SCALDStar.  Hence,  the  layout 
was  generated  using  about  two  and  half  thousand  trials.  Doubling  the  complexity  of  the 
layout  to  be  created  approximately  halfs  the  number  of  trials  which  can  be  performed 
per  second  and  about  twice  as  many  trials  must  be  performed  to  obtain  a  layout  of  simi¬ 
lar  size  optimality. 


LA  YO  UT  OPTIMIZA  TIONS 


While  in  a  state  of  constant  refinement,  ARTIST  is  in  use  at  Penn  State  for  ongo¬ 
ing  architecture  research  and  class  projects  in  VLSI  courses.  However,  as  pointed  out  in 
the  introduction,  ARTIST  owes  its  existence  to  academic  reasons  as  well  as  pragmatic 
reasons.  Toward  fulfilling  the  academic  reasons,  we  are  experimenting  with  several 
different  versions  of  the  approximation  algorithms  used  to  manipulate  the  internal 
description.  These  algorithms  generate  the  new  trial  configuration  by  switching  the  ord¬ 
er  of  statement  and  operand  pairs.  While  our  results  are  preliminary,  they  are  interest- 


We  first  wanted  to  develop  a  base  line  approximation  algorithm  by  which  other  al¬ 
gorithms  could  be  judged.  The  natural  candidate  for  such  an  algorithm  seemed  to  us  to 
be  a  totally  random  algorithm.  This  algorithm,  our  Monte  Carlo  algorithm,  uses  a  to¬ 
tally  random  configuration  for  each  new  trial.  We  then  implemented  what  we  thought 
would  be  the  tried  and  true  approximation  algorithm,  simulating  annealing.  Simulating 
annealing  generates  a  new  trial  configuration  by  incrementally  changing  the  old 
configuration.  That  is  changing  the  order  of  only  one  statement  or  operand  pair.  Fig¬ 
ure  11  gives  a  comparison  of  the  layout  obtained  by  the  simulating  annealing  and  Monte 
Carlo  algorithms  for  a  typical  module  of  two  hundred  transistors. 
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Figure  11.  Random  v.s.  Annealing 

Each  line  of  Figure  11  represents  the  best  configuration  found  using  the  indicated 
number  of  trials.  Much  to  our  surprise,  for  ten  thousand  trial  configurations  the  ran¬ 
dom  algorithm  was  better.  Furthermore,  this  result  could  be  consistently  reproduced  for 
different  modules  and  cooling  scheduling  strategies. 

We  offer  the  following  analogy  as  an  explanation  for  this  phenomenon.  Suppose  a 
person  was  standing,  blindfolded,  in  a  rectangularly  tiled  room.  The  person  is  trying  to 
find  the  smoothest  tile  in  the  room.  The  person  may  move  from  tile  to  tile.  After  mov¬ 
ing  to  a  tile,  the  person  may  reach  down  and  touch  the  tile  to  determine  it’s  smooth- 


If  the  smoothness  of  s  given  tile  is  totally  independent  of  the  smoothness  of  the 
other  tiles  in  the  room,  random  walking  (each  move  must  be  between  adjacent  tiles)  or 
random  leaps  (each  move  need  not  be  between  adjacent  tiles)  is  as  good  as  any  non- 
determistic  strategy.  Annealing  succeeds  when  the  smoothness  of  a  given  tile  is  not  in¬ 
dependent  of  the  smoothness  of  the  other  tiles  in  the  room.  Using  annealing,  a  person 
would  compare  the  tile  they  are  standing  on  with  one  of  the  adjacent  tiles.  They  would 
then  tend  to  move  to  the  smoother  of  the  two. 

However,  the  very  reason  (dependence)  that  seems  to  make  annealing  work  can  pro¬ 
duce  very  long  search  times.  For  example,  suppose  the  tiles  in  one  quadrant  of  the 
room  have  nearly  the  same  smoothness  and  that  the  tiles  in  the  other  three  quadrants 
are  all  much  rougher  than  the  tiles  in  the  first  quadrant  and  that  many  local  optimums 
exist.  Now  start  the  person  searching  as  far  from  the  smooth  quadrant  as  possible.  To 
find  the  smooth  quadrant,  the  person  may  (and  probably  does)  spend  a  lot  of  time  stum¬ 
bling  around  the  other  three  quadrants,  since  they  can  in  effect  see  only  the  smoothness 
of  the  tiles  in  their  immediate  vicinity.  However,  if  the  person  can  leap  randomly 
around  the  room,  he  would  find  one  of  the  tiles  in  the  smooth  quadrant  after  only  six¬ 
teen  leaps  with  reasonably  high  probability.  While  our  example  seems  contrived,  it 
reflects  to  a  remarkingly  high  degree  the  search  space  as  seen  by  ARTIST  -  a  few  (as 
compared  to  the  entire  search  space)  relatively  large  global  optimums  (as  compared  to 
the  entire  search  space)  surrounded  by  may  small  local  optimums. 

The  avocates  of  annealing  would  point  out  that  after  enough  tries,  the  nonleaping 
person  would  find  the  smooth  quadrant  and  would  then  find  a  solution  even  better  than 
the  random  leaping  person  would  have  been  able  to  find  after  the  same  number  of  tries. 
While  we  don’t  dispute  this  point,  we  only  offer  the  observation  that  many  hours  of 
stumbling  around  may  be  necessary  to  reach  this  point. 

Our  first  attempt  to  try  to  improve  the  efficiency  of  annealing  was  to  use  random 
initial  trials  and  then  to  switch  over  to  simulating  annealing.  The  problem  with  this  ap¬ 
proach  was  in  developing  a  good  mechanism  to  determine  when  the  switch  between  the 
two  modes  should  take  place.  Our  second  attempt  solves  this  problem.  Simulated  an¬ 
nealing  is  used  through  out  the  running  of  the  algorithm.  However,  at  the  beginning,  to 
generate  a  new  configuration  ARTIST  makes  n  random  incremental  changes  to  the  old 
configuration.  Hence,  if  n  is  large  enough,  ARTIST  evaluates  almost  random 
configurations.  As  ARTIST  progresses,  n  is  made  smaller  until  it  is  only  1  (incremental 
configuration  changes).  Using  initial  values  of  5  and  10  for  n  ,  we  obtained  the  results 


given  in  Figure  12  for  same  module  used  to  obtain  the  results  given  in  Figure  11. 


Figure  11.  Random  Annealing. 


Again,  each  line  of  Figure  11  represents  the  best  configuration  found  using  the  indicated 
number  of  trials.  The  results  show  that  the  hybrid  algorithm  is  better  than  either  a 
solely  random  or  solely  annealing  approach.  The  best  results  were  obtained  for  n  equal 
to  10.  These  results  could  be  consistently  reproduced  for  different  modules. 

FUTURE  DIRECTIONS 

Our  investigation  into  this  area  is  actually  far  from  over  as  we  must  now  deal  with 
two  cooling  schedules.  However,  the  control  of  randomness  does  appear  to  be  the  more 
critical  issue.  We  plan  to  further  analyze  the  behavior  of  our  current  hybrid  algorithm 
and  to  try  other  algorithms.  We  plan  to  expand  ARTIST  so  that  it  can  handle  stacks 
of  mesh  arrays  (several  mesh  arrays  stacked  one  on  top  of  the  other)  and  to  improve  the 
performance  of  ARTIST  so  larger  layouts  can  be  handled. 

At  present,  ARTIST  assumes  that  logically  equivalent  circuits  are  likewise  physical¬ 
ly  equivalent  (produce  the  same  behavior).  This  is  not  always  the  case.  Consider,  for 
example,  charge  sharing.  Because  of  charge  sharing,  the  following  two  circuits  are  not 
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physically  equivalent  because  of  effects  due  to  the  capacitors  at  A  and  B. 
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We  plan  to  continue  to  investigate  this  problem. 
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